Imputations for High Missing Rate Data in Covariates Via Semi-supervised Learning Approach

نویسندگان

چکیده

Advancements in data collection techniques and the heterogeneity of resources can yield high percentages missing observations on variables, such as block-wise data. Under missing-data scenarios, traditional methods simple average, k-nearest neighbor, multiple, regression imputations may lead to results that are unstable or unable be computed. Motivated by concept semi-supervised learning, we propose a novel approach with which fill values covariates have rates. Specifically, consider nonmissing subjects any covariate unlabeled labeled target outputs, respectively, treat their corresponding responses inputs. This innovative setting allows us impute large number without imposing model assumptions. In addition, resulting imputation has closed form for continuous covariates, it calculated efficiently. An analogous procedure is applicable discrete covariates. We further employ nonparametric show theoretical properties imputed Simulation studies an online consumer finance example presented illustrate usefulness proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing Data Imputation for Supervised Learning

This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on ...

متن کامل

Semi-supervised Data Representation via Affinity Graph Learning

We consider the general problem of utilizing both labeled and unlabeled data to improve data representation performance. A new semi-supervised learning framework is proposed by combing manifold regularization and data representation methods such as Non negative matrix factorization and sparse coding. We adopt unsupervised data representation methods as the learning machines because they do not ...

متن کامل

Analysis of presence-only data via semi-supervised learning approaches

Presence-only data occur in classification, which consist of a sample of observations from presence class and a large number of background observations with unknown presence/absence. Since absence data are generally unavailable, conventional semisupervised learning approaches are no longer appropriate as they tend to degenerate and assign all observations to presence class. In this article, we ...

متن کامل

Semi - supervised Learning Methods for Data Augmentation

The original goal of this project was to investigate the extent to which data augmentation schemes based on semi-supervised learning algorithms can improve classification accuracy in supervised learning problems. The objectives included determining the appropriate algorithms, customising them for the purposes of this project and providing their Matlab implementations. These algorithms were to b...

متن کامل

Data Selection for Semi-Supervised Learning

The real challenge in pattern recognition task and machine learning process is to train a discriminator using labeled data and use it to distinguish between future data as accurate as possible. However, most of the problems in the real world have numerous data, which labeling them is a cumbersome or even an impossible matter. Semi-supervised learning is one approach to overcome these types of p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Business & Economic Statistics

سال: 2021

ISSN: ['1537-2707', '0735-0015']

DOI: https://doi.org/10.1080/07350015.2021.1922120